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(54) Computer aided visualisation of expression comparison 



(57) Innovative systems and methods for visualizing 
information collected from analyzing samples are pro- 
vided. The samples may include nucleic acids, proteins, 
or other polymers. Gene expression level as determined 
from analysis of a nucleic acid sample is one possible 
analysis result that may be visualized. In one embodi- 



ment, a computer system may display the expression 
levels of multiple genes simultaneously in a way that fa- 
cilitates user identification of genes whose expression 
is significant to a characteristic such as disease or re- 
sistance to disease. Additionally, the computer system 
may facilitate display of further information about rele- 
vant genes once they are identified. 
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Description 

BACKGROUND OF THE INVENTION 

[0001] The present invention relates to the field of 
computer systems. More specifically, the present inven- 
tion relates to computer systems for visualizing analysis 
results. 

[0002] Devices and computer systems for forming 
and using arrays of materials on a substrate are known. 
For example, PCT Publication No. WO 92/1 0588, incor- 
porated herein by reference for all purposes, describes 
techniques for sequencing or sequence checking nucle- 
ic acids and other materials. Arrays for performing these 
operations may be formed according to the methods of, 
for example, the pioneering techniques disclosed in U. 
S. Patent No. 5,143,854 and U.S. Patent No. 5,593,839 
both incorporated herein by reference for all purposes. 
[0003] According to one aspect of the techniques de- 
scribed therein, an array of nucleic acid probes is fabri- 
cated at known locations on a substrate or chip. A fluo- 
rescently labeled nucleic acid is then brought into con- 
tact with the chip and a scanner generates an image file 
(which is processed into a cell file) indicating the loca- 
tions where the labeled nucleic acids bound to the chip. 
Based upon the cell file and identities of the probes at 
specific locations, it becomes possible to extract infor- 
mation such as the monomer sequence of DNA or RNA. 
Such systems have been used to form, for example, ar- 
rays of DNA that may be used to study and detect mu- 
tations relevant to cystic fibrosis, the P53 gene (relevant 
to certain cancers), HIV, and other genetic characteris- 
tics. 

[0004] Computer-aided techniques for monitoring 
gene expression using such arrays of probes have also 
been developed as disclosed in U.S. Patent Application 
No. 08/828,952 (Attorney Docket No. 16528X- 
028900US) and PCT Publication No. WO 97/10365 (At- 
torney Docket No. 16528X-0171 10PC), the contents of 
which are herein incorporated by reference. Many dis- 
ease states are characterized by differences in the ex- 
pression levels of various genes either through changes 
in the copy number of the genetic DNA or through 
changes in levels of transcription (e.g., through control 
of initiation, provision of RNA precursors, RNA process- 
ing, etc.) of particular genes. For example, losses and 
gains of genetic material play an important role in ma- 
lignant transformation and progression. Furthermore, 
changes in the expression (transcription) levels of par- 
ticular genes (e.g., oncogenes or tumor suppressors), 
serve as signposts for the presence and progression of 
various cancers. 

[0005] It is desirable to identify genes having expres- 
sion levels relevant to diagnosis of a diseased state by 
analyzing the expression levels of large numbers of 
genes in both diseased and normal individuals. Methods 
for collecting the expression level information have been 
developed. However, the user interfaces for gene ex- 



pression monitoring systems that have been developed 
until now are designed to clearly present the expression 
of particular pre-selected genes. A user seeking to iden- 
tify, e.g., an oncogene or a tumor suppressor gene, must 

s individually review the expression level of large num- 
bers of genes and compare the expression levels be- 
tween diseased and normal individuals. What is needed 
is a user interface that takes advantage of collected 
gene expression information to help the user to identify 

10 particular genes of interest. 

SUMMARY OF THE INVENTION 

[0006] The present invention provides innovative sys- 

15 terns and methods for visualizing information collected 
from analyzing samples. The samples may include nu- 
cleic acids, proteins, or other polymers. Gene expres- 
sion level as determined from analysis of a nucleic acid 
sample is one possible analysis result that may be vis- 

20 ualized. In one embodiment, a computer system may 
display the expression levels of multiple genes simulta- 
neously in a way that facilitates user identification of 
genes whose expression is significant to a characteristic 
such as disease or resistance to disease. Additionally, 

25 the computer system may facilitate display of further in- 
formation about relevant genes once they are identified. 
[0007] A first aspect of the invention provides a com- 
puter-implemented method for presenting expression 
level information as collected from first and second sam- 

30 pies. The method includes steps of: displaying a first ax- 
is corresponding to expression level in the first sample, 
and displaying a second axis substantially perpendicu- 
lar to the first axis, the second axis corresponding to ex- 
pression level in the second sample. The method further 

35 includes a step of: for a selected expressed sequence, 
displaying a mark at a position. The position is selected 
relative to the first axis in accordance with an expression 
level of the selected expressed sequence in the first 
sample and relative to the second axis in accordance 

40 with an expression level of the selected expressed se- 
quence in the second sample. A particularly useful ap- 
plication is displaying many marks simultaneously for 
many selected genes to discover which ones of the se- 
lected genes may be relevant to the characteristic. 

45 [0008] A second aspect of the invention provides a 
computer-implemented method of presenting sample 
analysis information. The method includes steps of: dis- 
playing a first axis corresponding to a concentration of 
a compound in a first sample as determined by monitor- 

50 ing binding of the compound to a selected polymer hav- 
ing binding affinity to the compound, and displaying a 
second axis substantially perpendicular to the first axis. 
The second axis corresponds to a concentration of the 
compound in the second sample as determined by mon- 

55 itoring binding of the compound to the selected polymer. 
The method further preferably includes a step of display- 
ing a mark at a position. The position is selected relative 
to the first axis in accordance with the concentration in 
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the first sample and relative to the second axis in ac- 
cordance with the concentration in the second sample. 
[0009] A further understanding of the nature and ad- 
vantages of the inventions herein may be realized by 
reference to the remaining portions of the specification 
and the attached drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0010] Fig. 1 illustrates an example of a computer sys- 
tem that may be used to execute software embodiments 
of the present invention. 

[0011] Fig. 2 shows a system block diagram of a typ- 
ical computer system. 

[0012] Fig. 3 illustrates an overall system for forming 
and analyzing arrays of polymers including biological 
materials such as DNA or RNA. 

[0013] Fig. 4 is an illustration of an embodiment of 
software for the overall system. 

[0014] Fig. 5 shows a flowchart of a process of mon- 
itoring the expression of a gene by comparing hybridi- 
zation intensities of pairs of perfect match and mismatch 
probes. 

[0015] Fig. 6 shows a screen display illustrating gene 
expression levels for multiple genes as collected from 
both normal and diseased tissue. 
[0016] Figs. 7A-7B show screen displays illustrating 
information about a particular gene selected from the 
display of Fig. 6. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 

[0017] The present invention provides innovative 
methods of monitoring visualizing gene expression. In 
the description that follows, the invention will be de- 
scribed in reference to preferred embodiments. Howev- 
er, the description is provided for purposes of illustration 
and not for limiting the spirit and scope of the invention. 
[0018] Fig. 1 illustrates an example of a computer sys- 
tem that may be used to execute software embodiments 
of the present invention. Fig. 1 shows a computer sys- 
tem 1 which includes a monitor 3, screen 5, cabinet 7, 
keyboard 9, and mouse 11 . Mouse 11 may have one or 
more buttons such as mouse buttons 13. Cabinet 7 
houses a CD-ROM drive 15 and a hard drive (not 
shown) that may be utilized to store and retrieve soft- 
ware programs including computer code incorporating 
the present invention. Although a CD-ROM 17 is shown 
as the computer readable medium, other computer 
readable media including floppy disks, DRAM, hard 
drives, flash memory, tape, and the like may be utilized. 
Cabinet 7 also houses familiar computer components 
(not shown) such as a processor, memory, and the like. 
[0019] Fig. 2 shows a system block diagram of com- 
puter system 1 used to execute software embodiments 
of the present invention. As in Fig. 1, computer system 
1 includes monitor 3 and keyboard 9. Computer system 
1 further includes subsystems such as a central proc- 



essor 50, system memory 52, I/O controller 54, display 
adapter 56, removable disk 58, fixed disk 60, network 
interface 62, and speaker 64. Removable disk 58 is rep- 
resentative of removable computer readable media like 
5 floppies, tape, CD-ROM, removable hard drive, flash 
memory, and the like. Fixed disk 60 is representative of 
an internal hard drive or the like. Other computer sys- 
tems suitable for use with the present invention may in- 
clude additional or fewer subsystems. For example, an- 
other computer system could include more than one 
processor 50 (i.e., a multi-processor system) or memory 
cache. 

[0020] Arrows such as 66 represent the system bus 
architecture of computer system 1. However, these ar- 
rows are illustrative of any interconnection scheme serv- 
ing to linkthe subsystems. For example, display adapter 
56 may be connected to central processor 50 through a 
local bus or the system may include a memory cache. 
Computer system 1 shown in Fig. 2 is but an example 
of a computer system suitable for use with the present 
invention. Other configurations of subsystems suitable 
for use with the present invention will be readily appar- 
ent to one of ordinary skill in the art. In one embodiment, 
the computer system is an IBM compatible personal 
computer. 

[0021] The VILSIPS™ and GeneChip™ technologies 
provide methods of making and using very large arrays 
of polymers, such as nucleic acids, on very small chips. 
See U.S. Patent No. 5,143,854 and PCT Patent Publi- 
cation Nos. WO 90/15070 and 92/10092, each of which 
is hereby incorporated by reference for all purposes. Nu- 
cleic acid probes on the chip are used to detect comple- 
mentary nucleic acid sequences in a sample nucleic ac- 
id of interest (the "target" nucleic acid). 
[0022] It should be understood that the probes need 
not be nucleic acid probes but may also be other recep- 
tors, such as antibodies, or polymers such as peptides. 
Peptide probes may be used to detect the concentration 
of other peptides, proteins, or other compounds in a 
sample. The probes must be carefully selected to have 
bonding affinity to the compound whose concentration 
they are to be used to measure. 

[0023] In one embodiment, the present invention pro- 
vides methods of visualizing information relating to the 
concentration of compounds in a sample as measured 
by monitoring affinity of the compounds to probes. In a 
particular application, the concentration information is 
generated by analysis of hybridization intensity files for 
a chip containing hybridized nucleic acid probes. The 
hybridization of a nucleic acid sample to certain probes 
may represent the expression level of one more genes 
or expressed sequence tags (ESTs). The expression 
level of a gene or EST is herein understood to be the 
concentration within a sample of mRNA or protein that 
would result from the transcription of the gene or EST. 
[0024] Expression level information visualized by vir- 
tue of the present invention need not be obtained from 
probes but may originate from any source. If the expres- 
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sion information is collected from a probe array, the 
probe array need not meet any particular criteria for size 
and density. Furthermore, the present invention is not 
limited to visualizing fluorescent measurements of 
bondings such as hybridizations but may be readily uti- 
lized to visualize other measurements. 
[0025] Concentration of compounds other than nucle- 
ic acids may be visualized according to one embodiment 
of the present invention. For example, a probe array 
may include peptide probes which may be exposed to 
protein samples, polypeptide samples, or other com- 
pounds which may or may not bond to the peptide 
probes. By appropriate selection of the peptide probes, 
one may detect the presence or absence of particular 
compounds which would bond to the peptide probes. 
[0026] For purposes of illustration, the present inven- 
tion is described as being part of a system that designs 
a chip mask, synthesizes the probes on the chip, labels 
nucleic acids from a target sample, and scans the hy- 
bridized probes. Such a system is set forth in U.S. Pat- 
ent No. 5,571,639 which is hereby incorporated by ref- 
erence for all purposes. However, the present invention 
may be used separately from the overall system for an- 
alyzing data generated by such systems, such as at re- 
mote locations, or for visualizing the results of other sys- 
tems for generating expression information, or for visu- 
alizing concentrations of polymers other than nucleic ac- 
ids. 

[0027] Fig. 3 illustrates a computerized system for 
forming and analyzing arrays of biological materials 
such as RNA or DNA. A computer 1 00 is used to design 
arrays of biological polymers such as RNA or DNA. The 
computer 100 may be, for example, an appropriately 
programmed IBM personal computer compatible run- 
ning Windows NT including appropriate memory and a 
CPU as shown in Figs. 1 and 2. The computer system 
1 00 obtains inputs from a user regarding characteristics 
of a gene of interest, and other inputs regarding the de- 
sired features of the array. Optionally, the computer sys- 
tem may obtain information regarding a specific genetic 
sequence of interest from an external or internal data- 
base 1 02 such as GenBank. The output of the computer 
system 1 00 is a set of chip design computer files 1 04 in 
the form of, for example, a switch matrix, as described 
in PCT application WO 92/10092, and other associated 
computer files. 

[0028] The chip design files are provided to a system 
1 06 that designs the lithographic masks used in the fab- 
rication of arrays of molecules such as DNA. The system 
or process 106 may include the hardware necessary to 
manufacture masks 110 and also the necessary com- 
puter hardware and software 108 necessary to lay the 
mask patterns out on the mask in an efficient manner. 
As with the other features in Fig. 3, such equipment may 
or may not be located at the same physical site, but is 
shown together for ease of illustration in Fig. 3. The sys- 
tem 106 generates masks 110 or other synthesis pat- 
terns such as chrome-on-glass masks for use in the fab- 



rication of polymer arrays. 

[0029] The masks 110, as well as selected informa- 
tion relating to the design of the chips from system 1 00, 
are used in a synthesis system 112. Synthesis system 

5 112 includes the necessary hardware and software 
used to fabricate arrays of polymers on a substrate or 
chip 114. For example, synthesizer 112 includes a light 
source 116 and a chemical flow cell 118 on which the 
substrate or chip 114 is placed. Mask 110 is placed be- 

10 tween the light source and the substrate/chip, and the 
two are translated relative to each other at appropriate 
times for deprotection of selected regions of the chip. 
Selected chemical reagents are directed through flow 
cell 118 for coupling to deprotected regions, as well as 

15 for washing and other operations. All operations are 
preferably directed by an appropriately programmed 
computer 1 1 9, which may or may not be the same com- 
puter as the computer(s) used in mask design and mask 
making. 

20 [0030] The substrates fabricated by synthesis system 
1 1 2 are optionally diced into smaller chips and exposed 
to marked targets. The targets may or may not be com- 
plementary to one or more of the molecules on the sub- 
strate. The targets are marked with a label such as a 

25 fluorescein label (indicated by an asterisk in Fig. 3) and 
placed in scanning system 120. Scanning system 120 
again operates under the direction of an appropriately 
programmed digital computer 122, which also may or 
may not be the same computer as the computers used 

30 in synthesis, mask making, and mask design. The scan- 
ner 1 20 includes a detection device 1 24 such as a con- 
focal microscope or CCD (charge-coupled device) that 
is used to detect the location where labeled target has 
bound to the substrate. The output of scanner 1 20 is an 

35 image file(s) 124 indicating, in the case of fluorescein 
labeled target, the fluorescence intensity (photon counts 
or other related measurements, such as voltage) as a 
function of position on the substrate. Since higher pho- 
ton counts will be observed where the labeled target has 

40 bound more strongly to the array of polymers, and since 
the monomer sequence of the polymers on the sub- 
strate is known as a function of position, it becomes pos- 
sible to determine the sequence(s) of polymer(s) on the 
substrate that are complementary to the target. 

45 [0031] The image file 124 is provided as input to an 
analysis system 126 that incorporates the visualization 
and analysis methods of the present invention. Again, 
the analysis system may be any one of a wide variety 
of computer system. The present invention provides 

50 various methods of analyzing and visualizing the chip 
design files and the image files, providing appropriate 
output 128. The chip design need not include any par- 
ticular number of probes. It should be understood that 
the present invention does not require any particular 

55 source of expression level information. 

[0032] Fig. 4 provides a simplified illustration of the 
overall software system used in the operation of one em- 
bodiment of the invention. As shown in Fig. 4, the sys- 
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tem first identifies the nucleotide sequence(s) or targets 
that would be of interest in a particular expression level 
analysis at step 202. The sequences of interest corre- 
spond to mRN A transcripts of one or more genes, ESTs 
or nucleic acids derived from the mRNA transcripts. Se- 
quence selection may be provided via manual input of 
text files or may be from external sources such as Gen- 
Bank. 

[0033] At step 204 the system evaluates the sequenc- 
es of interest to determine or assist the user in deter- 
mining which probes would be desirable on the chip, and 
provides an appropriate "layout" on the chip for the 
probes. The process of selecting probes for an expres- 
sion level analysis is explained in PCT Publication No. 
WO 97/10365, the contents of which are herein incor- 
porated by reference. An alternative probe selection 
process that does not require prior knowledge of se- 
quences of interest is explained in PCT Publication No. 
W097/27317 (Attorney Docket No. 18547-01 941 0PC), 
the contents of which are herein incorporated by refer- 
ence. Further general background on probe selection is 
found in PCT Publication No. W095/11995 (Attorney 
Docket No. 18547-004111 PC) and PCT Publication No. 
W097/29212 (Attorney Docket No. 1 8547-01 8540PC), 
the contents of which are herein incorporated by refer- 
ence. The term "perfect match probe" refers to a probe 
that has a sequence that is perfectly complementary to 
a particular target sequence. The test probe is typically 
perfectly complementary to a portion (subsequence) of 
the target sequence. The term "mismatch control" or 
"mismatch probe" refer to probes whose sequence is 
deliberately selected not to be perfectly complementary 
to a particular target sequence. For each mismatch 
(MM) control in an array there typically exists a corre- 
sponding perfect match (PM) probe that is perfectly 
complementary to the same particular target sequence. 
[0034] The process compares hybridization intensi- 
ties of pairs of perfect match and mismatch probes that 
are preferably covalently attached to the surface of a 
substrate or chip. Most preferably, the nucleic acid 
probes have a density greater than about 60 different 
nucleic acid probes per 1 cm 2 of the substrate. 
[0035] Initially, nucleic acid probes are selected that 
are complementary to the target sequence. These 
probes are the perfect match probes. Another set of 
probes is specified that are intended to be not perfectly 
complementary to the target sequence. These probes 
are the mismatch probes and each mismatch probe in- 
cludes at least one nucleotide mismatch from a perfect 
match probe. Accordingly, a mismatch probe and the 
perfect match probe to which it is identical except for 
one base make up a pair. As mentioned earlier, the nu- 
cleotide mismatch is preferably near the center of the 
mismatch probe. 

[0036] The probe lengths of the perfect match probes 
are typically chosen to exhibit detectably greater hybrid- 
ization with the target sequence relative to the mismatch 
probes. For example, the nucleic acid probes may be 



all 20-mers. However, probes of varying lengths may al- 
so be synthesized on the substrate for any number of 
reasons including resolving ambiguities. 
[0037] Again referring to Fig. 4, at step 206 the masks 

s for the synthesis are designed. At step 208 the software 
utilizes the mask design and layout information to make 
the DNA or other polymer chips. This step 208 will con- 
trol, among other things, relative translation of a sub- 
strate and the mask, the flow of desired reagents 

10 through a flow cell, the synthesis temperature of the flow 
cell, and other parameters. At step 210, another piece 
of software is used in scanning a chip thus synthesized 
and exposed to a labeled target. The software controls 
the scanning of the chip, and stores the data thus ob- 

15 tained in a file that may later be utilized to extract hy- 
bridization information. 

[0038] At step 212 a computer system utilizes the lay- 
out information and the fluorescence information to 
evaluate the hybridized nucleic acid probes on the chip. 

20 Among the important pieces of information obtained 
from DNA chips are the relative fluorescent intensities 
obtained from the perfect match probes and mismatch 
probes. These intensity levels are used to estimate an 
expression level for a gene or EST. The computer sys- 

25 tern used for analysis will preferably have available oth- 
er details of the experiment including possibly the gene 
name, gene sequence, probe sequences, probe loca- 
tions bn the substrate, and the like. 
[0039] According to the present invention, at step 21 4, 

30 the same computer system used for analysis or another 
one displays the expression level information in a format 
useful for identifying genes of interest. The visualized 
expression level information may include information 
collected from multiple applications of one or more pre- 

35 vious steps of Fig. 4. 

[0040] Fig. 5 is a flowchart describing steps of esti- 
mating an expression level for a particular gene and de- 
termining whether the expression level is sufficiently 
high to be displayed. At step 952, the computer system 

40 receives raw scan data of N pairs of perfect match and 
mismatch probes. In a preferred embodiment, the hy- 
bridization intensities are photon counts from a fluores- 
cein labeled target that has hybridized to the probes on 
the substrate. For simplicity, the hybridization intensity 

45 of a perfect match probe will be designed "l pm " and the 
hybridization intensity of a mismatch probe will be de- 
signed "l mm ." 

[0041] Hybridization intensities for a pair of probes are 
retrieved at step 954. The background signal intensity 

50 is subtracted from each of the hybridization intensities 
of the pair at step 956. Background subtraction can also 
be performed on all the raw scan data at the same time. 
[0042] At step 958, the hybridization intensities of the 
pair of probes are compared to a difference threshold 

55 (D) and a ratio threshold (R). It is determined if the dif- 
ference between the hybridization intensities of the pair 
Opm " 'mm) ' s greater than or equal to the difference 
threshold AND the quotient of the hybridization intensi- 
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ties of the pair (l pm / l mm ) is greater than or equal to the 
ratio threshold. The difference thresholds are typically 
user defined values that have been determined to pro- 
duce accurate expression monitoring of a gene or 
genes. In one embodiment, the difference threshold is 
20 and the ratio threshold is 1 .2. 
[0043] If l pm - l mm > = D and l pm / l mm > = R, the value 
NPOS is incremented at step 960. In general, NPOS is 
a value that indicates the number of pairs of probes 
which have hybridization intensities indicating that the 
gene is likely expressed. NPOS is utilized in a determi- 
nation of the expression of the gene. 
[0044] At step 962, it is determined if l mm - l pm > = D 
and l mm / l pm > = R. If these expressions are true, the 
value NNEG is incremented at step 964. In general, 
NNEG is a value that indicates the number of pairs of 
probes which have hybridization intensities indicating 
that the gene is likely not expressed. NNEG, like NPOS, 
is utilized in a determination of the expression of the 
gene. 

[0045] For each pair that exhibits hybridization inten- 
sities either indicating the gene is expressed or not ex- 
pressed, a log ratio value (LR) and intensity difference 
value (IDIF) are calculated at step 966. LR is calculated 
by the log of the quotient of the hybridization intensities 
of the pair (l pm / l mm ). The IDIF is calculated by the dif- 
ference between the hybridization intensities of the pair 
Opm " 'mm)- 'f there is a next pair of hybridization inten- 
sities at step 968, they are retrieved at step 954. 
[0046] At step 972, a decision matrix is utilized to in- 
dicate if the gene is expressed. The decision matrix uti- 
lizes the values N, NPOS, NNEG, LR (multiple LRs), 
and IDIF (multiple IDIFs). The following four assign- 
ments are performed: 

P1 = NPOS /NNEG 



P2 = NPOS/N 



P3 = SUM(LR)/N 



P4 = SUM(IDIF)/N 

These P values are then utilized to determine if the gene 
is expressed and if the expression level should be dis- 
played. In a preferred embodiment, the expression level 
of a gene should be displayed if: 

P1 >2.2 

P2 >0.3 

P3>0.8 

P4>30 

[0047] Once all the pairs of probes have been proc- 
essed and the expression of the gene indicated, an av- 
erage of the IDIF values for the probes that incremented 



NPOS or NNEG is calculated at step 975, which is uti- 
lized as an expression level. Of course, other values in- 
cluding one of P1 through P4 could be used to indicate 
expression level. 

5 [0048] For simplicity, Fig. 5 was described in refer- 
ence to a single gene or EST. However, the visualization 
system of the present invention displays expression re- 
sults for many genes to facilitate discovery of genes of 
interest or ESTs. Furthermore, the present invention 

10 contemplates display of expression levels of a single 
gene or ESTs' as collected from two or more different 
samples such as tissue samples. The sample sources 
preferably differ in some characteristic. It will be under- 
stood that when the term "sample" is used herein, meas- 

15 urements made on a single "sample" can be based on 
an aggregation of multiple sample collection events or 
even multiple organisms. 

[0049] Fig. 6 shows a screen display illustrating gene 
expression levels for multiple genes as collected from 

20 two tissue samples. A displayed horizontal axis 1002 
represents expression level measured in one or more 
nucleic acid samples taken from the first tissue sample. 
A displayed vertical axis 1004 represents expression 
level in one or more nucleic acid samples taken from the 

25 second tissue sample. Each of marks 1 006 represent a 
particular gene whose expression level has been meas- 
ured in both the first and second tissue samples. Each 
mark 1 006 is placed at a distance from vertical axis 1 004 
corresponding to expression level in the first tissue sam- 

30 pie and at a distance from the horizontal axis 1 002 cor- 
responding to expression level in the second tissue 
sample. 

[0050] The expression levels used for determining the 
position of marks 1 006 are preferably taken from the re- 

35 suit of step 975. The position of each of marks 1 006 de- 
pends on two iterations of the steps of Fig. 5, once for 
the sample taken from the first tissue sample and once 
for the sample taken from the second tissue sample. 
However, a mark is preferably displayed only if one of 

40 the samples meets the threshold criteria at step 972. 
[0051] In the depicted representative screen display, 
the first tissue sample is a cancerous tissue sample and 
the second tissue sample is a normal tissue sample. The 
individual marks represent the expression levels of se- 

45 lected genes in both cancerous and normal tissue. Afirst 
group of marks 1008 represent genes that are neither 
tumor suppressors nor oncogenes since their expres- 
sion levels are roughly similar for both normal and can- 
cerous tissue. These marks 1008 fall roughly along a 

50 line which is rotated 45 degrees from each of the axes. 
A second group of marks 1010 represent genes that are 
likely oncogenes since their expression levels are found 
to be significantly higher in cancerous tissue than in nor- 
mal tissue. A third group of marks 1012 represent genes 

55 that are likely tumor suppressors since their expression 
levels are found to be significantly higher in normal tis- 
sue than in cancerous tissue. It will be appreciated that 
expression levels for large numbers of genes can be re- 
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viewed at once to discover the oncogenes and tumor 
suppressors. 

[0052] Although in the depicted display, the two types 
of tissue are normal tissue and cancerous tissue, the 
present invention would aid in the discovery of genes 
whose expression is associated with any characteristic 
that varies among tissue samples. For example, once 
can compare expression results from tissue from indi- 
viduals who have been exposed to HIV but remain in- 
fected to tissue obtained from infected individuals to 
identify genes conferring resistance to HIV. One can 
compare expression results between tissue from plants 
that survive drought to plants that do not. One can com- 
pare expression levels among tissue samples at suc- 
cessive stages or severity levels of the same disease, 
among tissue samples where different ultimate out- 
comes of the disease (e.g., patient death or remission) 
are known, among diseased tissue samples that have 
been subject to different treatment regimes including e. 
g, chemotherapy, antisense RNA, etc. For cancers, one 
can compare expression levels between malignant cells 
and non-malignant cells. Also expression levels can be 
compared among different organs, between species, 
and among different stages of development of an organ. 
[0053] It will be appreciated that the present invention 
also encompasses displays with more than two dimen- 
sions. A third visual dimension can be used to illustrate 
expression level from a third tissue sample. The time 
dimension can also be used to illustrate successive 
groups of two or three tissue samples at successive time 
periods. The time dimension can be also used to corre- 
spond to tissue samples obtained at, e.g, successive 
stages of a disease. 

[0054] Other interface methods corresponding to hu- 
man senses other than sight can also be incorporated 
within the presentation system of the present invention. 
The senses may correspond to additional dimensions. 
For example, marks can be displayed in succession ac- 
companies by a sound having characteristics corre- 
sponding to expression level in another tissue sample. 
[0055] The user can employ a cursor 1 01 4 to identify 
a particular mark as being of interest. Cursor 1014 can 
be moved to a particular mark by use of, e.g., mouse 
1 1 . Once cursor 1 01 4 is over a mark of interest, the mark 
can be selected by, e.g., depression of one of mouse 
buttons 13. Selection of a particular mark can be facili- 
tated by use of a zoom display feature (not shown). 
Once a particular mark is selected, further information 
is displayed about the gene represented by the mark. A 
special mouse can transmit a tactile sensation back to 
the user corresponding to expression level in a tissue 
sample as the user passes the mouse over a corre- 
sponding mark. 

[0056] It will be appreciated that the display of Fig. 6 
is not limited to expression information. The two dimen- 
sions of Fig. 6 may correspond to indicators of the pres- 
ence of various polymers other than nucleic acids in two 
different samples. For example, each mark may corre- 



spond to a different polymer, polypeptide, or other com- 
pound. The distance of the mark from each axis would 
correspond to a measure of presence of the particular 
polymer in the sample corresponding to the axis. One 

s possible measure is produced by fluorescently tagging 
polymer samples such as protein samples and exposing 
a probe array such as a peptide probe array to the pro- 
tein samples. The fluorescent intensity of the probes will 
then correspond to the bonding affinity of the sample to 

10 the probes. The intensity measurement or a measure- 
ment derived from the intensity measurement may then 
be used to position the marks of Fig. 6. 
[0057] Fig. 7 A shows a screen display giving informa- 
tion about a particular gene selected from the display of 

15 Fig. 6. A cluster number 702, a GenBank accession 
number 704, and a verbal description 706 for the select- 
ed gene are displayed. The user can also select a 
number of marks 1 006 by circling them with cursor 1014. 
Then a list of information as shown in Fig. 7A is dis- 

20 played for all the genes corresponding to the selected 
marks. 

[0058] By selecting GenBank accession number 704 
with another cursor (not shown), the user can direct re- 
trieval of the GenBank information for the selected gene. 

25 |f the GenBank information is not available locally, the 
retrieval process can include formulating a query and 
transmitting the query to a GenBank web site. Once the 
GenBank information is retrieved, it can also be dis- 
played. Fig. 7B depicts the GenBank information for the 

30 gene identified in Fig. 7A. 

[0059] In the foregoing specification, the invention 
has been described with reference to specific exemplary 
embodiments thereof. It will, however, be evident that 
various modifications and changes may be made there- 

55 unto without departing from the broader spirit and scope 
of the invention as set forth in the appended claims and 
their full scope of equivalents. 



1 . A computer-implemented method of presenting ex- 
pression level information as collected from first and 
second samples, said method comprising the steps 
45 of: 

displaying a first axis corresponding to expres- 
sion level in said first sample; 
displaying a second axis substantially perpen- 
50 dicular to said first axis, said second axis cor- 

responding to expression level in said second 
sample; and 

for a selected expressed sequence, displaying 
a mark at a position, wherein said position is 
55 selected relative to said first axis in accordance 

with an expression level of said selected ex- 
pressed sequence in said first sample and rel- 
ative to said second axis in accordance with an 
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expression level of said selected expressed se- 
quence in said second sample. 

2. The method of claim 1 wherein said selected ex- 
pressed sequence comprises a gene. 

3. The method of claim 1 wherein said selected ex- 
pressed sequence comprises a portion of a gene. 

4. The method of claim 1 further comprising the step 
of repeating said displaying a mark step for a plu- 
rality of selected expressed sequences. 

5. The method of claim 1 further comprising the steps 
of: 

monitoring said expression level of said ex- 
pressed sequence in said first sample and said sec- 
ond sample. 

6. The method of claim 3 wherein said monitoring step 
for one of said samples comprises substeps of: 

inputting a plurality of hybridization intensities 
of pairs of perfect match and mismatch probes, 
said perfect match probes being perfectly com- 
plementary to a target nucleic acid sequence 
indicative of expression of said selected gene 
and said mismatch probes having at least one 
base mismatch with said target sequence, and 
said hybridization intensities indicating hybrid- 
ization affinity between said perfect match and 
mismatch probes and a sample nucleic acid se- 
quence from said one of said samples; 
comparing the hybridization intensities of each 
pair of perfect match probe and mismatch 
probe; and 

generating said expression level for said ex- 
pressed sequence and said one of said sam- 
ples responsive to results of said comparing 
step. 

7. The method of claim 6 further comprising the step 
of: 

comparing a difference between hybridization 
intensities of perfect match and mismatch probes 
at a base position to a difference threshold. 

8. The method of claim 7 further comprising the step 
of: 

comparing a quotient of hybridization intensi- 
ties of perfect match and mismatch probes at a base 
position to a ratio threshold. 

9. The method of claim 6 further comprising the steps 
of: 

a) counting a probe pair as a positive probe pair 
to increment a positive probe pair count if a per- 



fect match probe intensity minus a mismatch 
probe intensity exceeds a difference threshold 
and said perfect match probe intensity divided 
by said mismatch probe intensity exceeds a ra- 
5 tio threshold; 

b) counting said probe pair as a negative probe 
pair to increment a negative probe pair count if 
said mismatch probe intensity minus said per- 
fect match probe intensity exceeds said differ- 

10 ence threshold and said mismatch probe inten- 

sity divided by said perfect match probe inten- 
sity exceeds said ratio threshold; and 

c) computing a logarithmic ratio of said perfect 
match probe intensity to said mismatch probe 

15 intensity. 

10. The method of claim 9 further comprising the steps 
of: 

20 repeating said a), b), and c) steps for each of 

said probe pairs, accumulating a sum of differ- 
ences of said perfect match and mismatch 
probe intensities for probe pairs that cause; and 
determining an expression level of said select- 
25 ed expressed sequence to be an average of 

said differences. 

11. The method of claim 1 further comprising the steps 
of: 

30 

receiving user input selecting said mark; and 
in response to said user input, displaying infor- 
mation about said selected expressed se- 
quence. 

35 

12. The method of claim 11 further comprising the steps 
of: 

in response to said user input, displaying in- 
formation about said selected expressed se- 
40 quence. 

13. The method of claim 12 wherein said information 
about said selected expressed sequence compris- 
es a GenBank accession number. 

45 

14. The method of claim 12 wherein said information 
about said selected expressed sequence compris- 
es a GenBank database record for said selected ex- 
pressed sequence. 

50 

1 5. The method of claim 1 wherein said first sample and 
said second sample are collected from tissue sam- 
ples differing in a particular characteristic. 

55 16. The method of claim 15 wherein said particular 
characteristic comprises presence of disease. 

17. The method of claim 15 wherein said particular 
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characteristic comprises a treatment strategy for a 
disease. 

18. The method of claim 1 wherein said particular char- 
acteristic is a stage of a disease. 

19. The method of claim 1 further comprising the step 
of : 

displaying a third axis substantially perpen- 
dicular to said first axis and to said second axis in 
a three-dimensional display environment wherein 
said position of said mark is further selected relative 
to said third axis in accordance with an expression 
level of said selected expressed sequence in a third 
sample. 

20. A computer-implemented method of presenting 
sample analysis information comprising the steps 
of: 

displaying a first axis corresponding to a con- 
centration of a compound in a first sample as 
determined by monitoring binding of said com- 
pound to a selected polymer having binding af- 
finity to said compound; 

displaying a second axis substantially perpen- 
dicular to said first axis, said second axis cor- 
responding to a concentration of said com- 
pound in said second sample as determined by 
monitoring binding of said compound to said 
selected polymer; and 

displaying a mark at a position, wherein said 
position is selected relative to said first axis in 
accordance with said concentration in said first 
sample and relative to said second axis in ac- 
cordance with said concentration in said sec- 
ond sample. 

21. The method of claim 20 wherein said selected pol- 
ymer comprises a nucleic acid sequence. 

22. The method of claim 20 wherein said selected pol- 
ymer comprises a protein. 

23. The method of claim 21 further comprising the step 
of: 

obtaining said concentration of said com- 
pound in said first sample by exposing said first 
sample to a plurality of nucleic acid probes. 

24. The method of claim 22 further comprising the step 
of: 

obtaining said concentration of said com- 
pound in said first sample by exposing said first 
sample to a plurality of peptide probes. 

25. A computer program product for presenting expres- 
sion level information as collected from first and 



second samples, said product comprising:: 

code for displaying a first axis corresponding to 
expression level in said first sample; 
5 code for displaying a second axis substantially 

perpendicular to said first axis, said second ax- 
is corresponding to expression level in said 
second sample; 

code for, for a selected expressed sequence, 
10 displaying a mark at a position, wherein said 

position is selected relative to said first axis in 
accordance with an expression level of said se- 
lected expressed sequence in said first sample 
and relative to said second axis in accordance 
15 with an expression level of said selected ex- 

pressed sequence in said second sample; and 
a computer-readable storage medium for stor- 
ing the codes. 

20 26. The product of claim 25 wherein said selected ex- 
pressed sequence comprises a gene. 

27. The product of claim 25 wherein said selected ex- 
pressed sequence comprises a portion of a gene. 

25 

28. The product of claim 25 further comprising code for 
repeatedly applying said displaying a mark code for 
a plurality of selected expressed sequences. 

30 29. The product of claim 25 further comprising: 

code for monitoring said expression level of 
said expressed sequence in said first sample and 
said second sample. 

35 30. The product of claim 27 wherein said monitoring 
step for one of said samples comprises: 

code for inputting a plurality of hybridization in- 
tensities of pairs of perfect match and mismatch 

40 probes, said perfect match probes being per- 

fectly complementary to a target nucleic acid 
sequence indicative of expression of said se- 
lected gene and said mismatch probes having 
at least one base mismatch with said target se- 

45 quence, and said hybridization intensities indi- 

cating hybridization affinity between said per- 
fect match and mismatch probes and a sample 
nucleic acid sequence from said one of said 
samples; 

so comparing the hybridization intensities of each 

pair of perfect match probe and mismatch 
probe; and 

generating said expression level for said ex- 
pressed sequence and said one of said sam- 
55 pies responsive to results of said comparing 

step. 

31. The product of claim 30 further comprising: 
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code for comparing a difference between hy- 
bridization intensities of perfect match and mis- 
match probes at a base position to a difference 
threshold. 

32. The product of claim 31 further comprising: 

code for comparing a quotient of hybridization 
intensities of perfect match and mismatch probes 
at a base position to a ratio threshold. 

33. The product of claim 30 further comprising: 

a) code for counting a probe pair as a positive 
probe pair to increment a positive probe pair 
count if a perfect match probe intensity minus 
a mismatch probe intensity exceeds a differ- 
ence threshold and said perfect match probe 
intensity divided by said mismatch probe inten- 
sity exceeds a ratio threshold; 

b) code for counting said probe pair as a neg- 
ative probe pair to increment a negative probe 
pair count if said mismatch probe intensity mi- 
nus said perfect match probe intensity exceeds 
said difference threshold and said mismatch 
probe intensity divided by said perfect match 
probe intensity exceeds said ratio threshold; 
and 

c) code for computing a logarithmic ratio of said 
perfect match probe intensity to said mismatch 
probe intensity. 

34. The product of claim 33 further comprising: 

code for repeatedly applying said a), b), and c) 
codes for each of said probe pairs, accumulat- 
ing a sum of differences of said perfect match 
and mismatch probe intensities for probe pairs 
that cause; and 

code for determining an expression level of said 
selected expressed sequence to be an average 
of said differences. 

35. The product of claim 25 further comprising: 

code for receiving user input selecting said 
mark; and 

code for, in response to said user input, display- 
ing information about said selected expressed 
sequence. 

36. The product of claim 35 further comprising: 

code for, in response to said user input, dis- 
playing information about said selected expressed 
sequence. 

37. The product of claim 36 wherein said information 
about said selected expressed sequence compris- 
es a GenBank accession number. 



38. The product of claim 36 wherein said information 
about said selected expressed sequence compris- 
es a GenBank database record for said selected ex- 
pressed sequence. 

5 

39. The product of claim 25 wherein said first sample 
and said second sample are collected from tissue 
samples differing in a particular characteristic. 

10 40. The product of claim 39 wherein said particular 
characteristic comprises presence of disease. 

41. The product of claim 39 wherein said particular 
characteristic comprises a treatment strategy for a 

15 disease. 

42. The product of claim 25 wherein said particular 
characteristic is a stage of a disease. 

20 43. The product of claim 25 further comprising the step 
of : 

displaying a third axis substantially perpen- 
dicular to said first axis and to said second axis in 
a three-dimensional display environment wherein 
25 said position of said mark is further selected relative 
to said third axis in accordance with an expression 
level of said selected expressed sequence in a third 
sample. 

30 44. A computer program product for presenting sample 
analysis information comprising: 

code for displaying a first axis corresponding to 
a concentration of a compound in a first sample 
35 as determined by monitoring binding of said 

compound to a selected polymer having bond- 
ing affinity to said compound; 
code for displaying a second axis substantially 
perpendicular to said first axis, said second ax- 
40 is corresponding to concentration of said com- 

pound in a second sample as determined by 
monitoring binding of said compound to said 
selected polymer; 

code for displaying a mark at a position, where- 
45 in said position is selected relative to said first 

axis in accordance with said concentration in 
said first sample and relative to said second ax- 
is in accordance with said concentration in said 
second sample; and 
50 a computer-readable storage medium that 

stores the codes. 

45. The product of claim 44 wherein said selected pol- 
ymer comprises a nucleic acid sequence. 

55 

46. The product of claim 44 wherein said selected pol- 
ymer comprises a protein. 
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47. A computer system comprising a display, a proces- 
sor, and a memory that stores instructions for con- 
figuring said processor to: 

display a first axis corresponding to expression 
level in said first sample; 
display a second axis substantially perpendic- 
ular to said first axis, said second axis corre- 
sponding to expression level in said second 
sample; and 

for a selected expressed sequence, display a 
mark at a position, wherein said position is se- 
lected relative to said first axis in accordance 
with an expression level of said selected ex- 
pressed sequence in said first sample and rel- 
ative to said second axis in accordance with an 
expression level of said selected expressed se- 
quence in said second sample. 

48. A computer system comprising a display, a proces- 
sor, and a memory that stores instructions for con- 
figuring said processor to: 

display a first axis corresponding to a concen- 
tration of a compound in a first sample as de- 
termined by monitoring binding of said com- 
pound to a selected polymer having binding af- 
finity to said compound; 

display a second axis substantially perpendic- 
ular to said first axis, said second axis corre- 
sponding to a concentration of said compound 
in said second sample as determined by mon- 
itoring binding of said compound to said select- 
ed polymer; and 

display a mark at a position, wherein said posi- 
tion is selected relative to said first axis in ac- 
cordance with said concentration in said first 
sample and relative to said second axis in ac- 
cordance with said concentration in said sec- 
ond sample. 

49. A method of monitoring gene expression in first and 
second samples, the method comprising presenting 
expression level information as collected from said 
first and second samples, in accordance with any 
of claims 1 to 19, and using the displayed mark to 
monitor said gene expression. 

50. A method of analysing first and second samples, 
the method comprising presenting sample analysis 
information relating to said first and second sam- 
ples, in accordance with any of claims 20 to 24, and 
using the displayed mark to analyse said samples. 

51 . A method of identifying a gene of interest, the meth- 
od comprising presenting expression level informa- 
tion relating to a gene, as collected from said first 
and second samples, in accordance with any of 



claims 1 to 1 9, and using the displayed markto iden- 
tify whether said gene is of interest. 

52. A method of identifying a gene having different ex- 
5 pression levels in first and second samples, the 

method comprising presenting expression level in- 
formation relating to said gene in accordance with 
any of claims 1 to 19, said expression level informa- 
tion being collected from said first and second sam- 
10 pies, and using the displayed mark to identify said 
gene. 

53. A method of identifying a gene having an effect on 
a characteristic of a tissue sample, the method corn- 
's prising presenting expression level information re- 
lating to said gene in accordance with any of claims 
1 to 1 9, said expression level information being col- 
lected from said first and second samples differing 
in said characteristic, and using the displayed mark 

20 to identify said gene. 

54. A method of identifying a compound having different 
concentrations in first and second samples, the 
method comprising presenting sample analysis in- 

25 formation relating to said first and second samples, 
in accordance with any of claims 20 to 24, and using 
the displayed mark to analyse said samples. 
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